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Technical Field 

[0001] The present invention relates generally to user interactive systems, and more 
specifically, to adjusting the voice prompt of a user interactive system based upon the 
state of the user. 

Background of the Invention 
[0002] User interactive systems interact with their users via bi-directional 
communication with the users. User interactive systems may include voice response 
systems in which the bi-directional communication is carried out by voice 
communication, i.e., the user speaks to the interactive system and the user interactive 
system also responds by a voice prompt. Examples of user interactive systems include 
navigation systems used in an automobile where the user asks for directions to a 
particular location by voice or by typing in a destination address and the navigation 
system responds by displaying the directions to the user along with voice instructions 
corresponding to the directions. Other examples include on-board computers used in 
automobiles to control the various functionalities (audio, air conditioning, etc.) of the 
automobile based upon interaction with the user. For example, the user may control the 
air conditioning in the automobile by interacting with the on-board computer by voice. 
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The user interactive system in an automobile is sometimes called a "virtual passenger," 
since they interact with the drivers as if another passenger were present in the vehicle. 

[0003] Conventional user interactive systems typically use the same tone or content 
of the voice prompt when they interact with the users. For example, when a conventional 
vehicle navigation system gives directions to a destination to a user, it will use the same 
tone (e.g., high tone or subdued tone) and same content (e.g., "Turn right at third street.") 
regardless of the user's state or driver's state, such as emotional states (happy, sad, 
excited, and the like) or other states (alert, drowsy, in a hurry, and the like). However, 
studies have shown that the interactive systems cannot communicate effectively with the 
users if they use the same tone or content of the voice prompt in their interaction with the 
users regardless of the user's state. Some conventional user interactive systems may 
change the voice (actor, dialect, etc.) manually by the user' choice, but they cannot adjust 
their voice prompts automatically by detecting the user's state. 

[0004] Therefore, there is a need for a method and system for determining the user's 
state in an interactive system. There is also a need for a method and system for adjusting 
or pausing the voice prompt of the interactive system based upon the determined user's 
state, especially in a voice response system, so that more effective interaction with the. 
user may be accomplished. 

Summary of Invention 

[0005] The present invention provides a method for adjusting a voice prompt of an 
interactive system based upon the state of a user. To this end, the method receives an 
utterance of the user, obtains utterance parameters indicating the state of the user from 
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the utterance, determines the state of the user based upon the utterance parameters, and 
adjusts the voice prompt output by adjusting at least one of the tone of voice of the voice 
prompt, the content of the voice prompt, the speed or prosody of the voice prompt, and a 
gender of the voice prompt based upon the determined state of the user. The state of the 
user may also be further determined by monitoring of the driving conditions, in case of an 
interactive system on an automobile. To obtain the utterance parameters, the utterance is 
partitioned into segments, and each segment is assigned a classification corresponding to 
at least one of a plurality of states of the user 

[0006] To determine the state of the user, the method generates an utterance 
parameter vector based upon the utterance parameters, converts the utterance parameter 
vector to an indication representing the state of the user, and determines the state of the 
user based upon the indication. To generate the utterance parameter vector, the method 
determines the number of segments for each classification, and divides the number of 
segments for each classification by the total number of segments in the utterance. The 
utterance parameter vector is converted to the indication by applying a linear function to 
the utterance parameter vector to generate one of a scalar, a vector of fuzzy classes, and 
an index representing the state of the user. In case the indication is a scalar, it is 
determined that the user is in a first state if the scalar is greater than a predetermined 
threshold and that the user is in a second state if the scalar is not greater than the 
predetermined threshold. 

[0007] The method of the present invention adjusts the tone of voice of the voice 
prompt to use a tone that is consistent with the determined state of the user. 
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Alternatively, the method of the present invention may adjust the content of the voice 
prompt to use content that is consistent with the determined state of the user. The method 
of the present invention may also adjust the speed or prosody of the voice prompt to use 
speed or prosody that is consistent with the determined state of the user. The method of 
the present invention may also adjust the gender of the voice prompt to use a gender that 
is consistent with the determined state of the user. Also, the method of the present 
invention may adjust any combination of two or more the tone of the voice prompt, the 
content of the voice prompt, and the gender of the voice prompt. 

[0008] The present invention also provides a system for adjusting the voice prompt of 
an interactive system based upon a state of a user. The system of the present invention 
comprises a signal processing module for obtaining utterance parameters from utterance 
received from the user, an utterance parameter vector generation module for generating 
an utterance parameter vector based upon the utterance parameters, a user state 
determination module for converting the utterance parameter vector to an indication 
representing the state of the user and determining the state of the user based upon the 
indication, and a speech waveform storage module for selecting an audio waveform of 
the voice prompt based upon the determined state of the user. 

[0009] The signal processing module obtains the utterance parameters by partitioning 
the utterance into segments and assigning a classification to each segment. The 
classification corresponds to at least one of a plurality of states of the user. The utterance 
parameter generation module generates the utterance parameter vector by determining the 
number of segments assigned to each classification, and dividing the number of segments 
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assigned to each classification by a total number of segments in the utterance. The user 
state determination module converts the utterance parameter vector to an indication by 
applying a linear function to the utterance parameter vector to generate one of a scalar, a 
vector of fuzzy classes, and an index representing the state of the user. In case the 
indication is a scalar, the user state determination module also determines that the user is 
in a first state if the scalar is greater than a predetermined threshold and that the user is in 
a second state if the scalar is not greater than the predetermined threshold. 

[0010] The speech waveform storage module selects the audio waveform of the voice 
prompt to have a tone that is consistent with the determined state of the user. 
Alternatively, the speech waveform storage module selects the audio waveform of the 
voice prompt to have content that is consistent with the determined state of the user. The 
speech waveform storage module may also select the audio waveform of the voice 
prompt to be of a gender that is consistent with the determined state of the user. 

[0011] In another embodiment, the system of the present invention may include a 
speech synthesizer module for synthesizing an audio waveform of the voice prompt based 
upon the determined state of the user, instead of or in addition to the speech waveform 
storage module that selects pre-stored audio waveforms. The speech synthesizer module 
generates the audio waveform of the voice prompt to have a tone that is consistent with 
the determined state of the user. The speech synthesizer module may also generate the 
audio waveform of the voice prompt based upon content that is consistent with the 
determined state of the user. Alternatively, the speech synthesizer module may 



5 



23230/08 i 42/DOCS/ 1 372070.2 



synthesize the audio waveform of the voice prompt to be of a gender that is consistent 
with the determined state of the user. 

[0012] The method and system of the present invention have the advantage that the 
voice prompt of the interactive system may be adjusted to be consistent with the user's 
emotional state, thereby appealing to the user's preferences. In case of an automobile on- 
board computer interactive system, adjusting the voice prompt to be consistent with the 
driver's state makes the driver feel comfortable and can also enhance better driving and 
promote alertness, confidence, and tolerance in the driver. 

Brief Description of the Drawings 

[0013] The teachings of the present invention can be readily understood by 
considering the following detailed description in conjunction with the accompanying 
drawings. 

[0014] FIG. 1 is a flowchart illustrating a method for adjusting the voice prompt of an 
interactive system based upon a user's state, according to one embodiment of the present 
invention. 

[0015] FIG. 2 is a flowchart illustrating steps 106 and 108 of the flowchart of FIG. 1 
in more detail. 

[0016] FIG. 3 is a block diagram illustrating an interactive system for adjusting its 
voice prompt based upon a user's state, according to one embodiment of the present 
invention. 
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[0017] FIG. 4 is a block diagram illustrating an interactive system for adjusting its 
voice prompt based upon a user's state, according to another embodiment of the present 
invention. 

Detailed Description of Embodiments 

[0018] The embodiments of the present invention will be described below with 
reference to the accompanying drawings. Like reference numerals are used for like 
elements in the accompanying drawings. 

[0019] FIG. 1 is a flowchart illustrating a method for adjusting the voice prompt of an 
interactive system based upon a user's state, according to one embodiment of the present 
invention. The method of FIG. 1 determines the state of the user of an interactive system 
and adjusts the voice prompt of the interactive system based upon the determined user's 
state. In the method of FIG. 1, it will be assumed for convenience of explanation that the 
interactive system is an on-board computer of an automobile and the user is a driver of 
the automobile, although any type of interactive system may be used consistent with the 
method of FIG. 1. 

[0020] Referring to FIG. 1, the method begins 102 by receiving and storing 104 the 
utterance of the user. For example, the driver may ask the on-board computer, "How 
long will it take for me to drive to San Francisco, California?" Then, utterance 
parameters are obtained 106 from the utterance to generate 108 an utterance parameter 
vector based upon the obtained utterance parameters. Steps 106 and 108 will be 
explained in more detail with reference to FIG. 2, which is a flowchart illustrating steps 
106 and 108 of the flowchart of FIG. 1 in more detail. 
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[0021] Referring to FIG. 2, in step 106 the utterance is partitioned 202 into segments. 
A segment is each phrase in the utterance with a minimum number of phonemes. The 
starting and ending points of a segment may be determined by detecting a pause, a silence 
or a sudden change in the utterance. The length of each segment may be uniform or non- 
uniform. 

[0022] In one embodiment, each segment is assigned 204 a classification indicating 
one of a plurality of states of a user. For example, the classifications may include PI 
(truth), P2 (stress), P3 (excitement), P4 (unsure), P5 (very stressed), P6 (voice control), 
P7 (tense), P8 (very tense), P9 (inaccurate), PA (implausible), PB (deceiving), PC 
(speech speed), PD (pause ratio), PE (clearness), PF (drowsy), PG (tired), PH 
(hesitation), PI (variance of the pitch during a segment), PJ (difference in pitch from one 
segment to the next segment), and PK (shape of the frequency spectrum in the segment). 
The assignment of these classifications to the segments of the utterance may be carried 
out by various lie detection software that is commercially available. The correspondence 
of particular emotional parameters to the certain types of segments of the utterance is 
determined empirically by inducing a certain type of emotion on a user, inducing an 
utterance from the user, analyzing the segments of the utterance, and statistically 
mapping the segments of the utterance to the type of induced emotional state of the user. 
Some of these parameters may vary together and thus may be grouped to reduce the 
number of utterance parameters. For example, PC (speech speed) and PD (pause ratio) 
may be grouped together. 
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[0023] In another embodiment, the segments may correspond to words and pauses in 
the utterance and each word may be assigned a classification of a general word and a 
particular type of emotionally sensitive word based on speech recognition. For example, 
in the utterance "Uhh, find a gas station," a classification such as "frustration word" may 
be assigned to "Uhh" and the classification "general word" may be assigned to the 
remaining words "find a gas station." For another example, in the utterance "Find me 
. . .[pause] a gas station nearby, a classification such as "at ease" or "pause" may be 
assigned to the [pause] in the utterance and the classification "general words" may be 
assigned to the remaining words "Oh find me a gas station nearby." 

[0024] In step 108, the number of assigned segments for each classification is 
determined 206, which is further divided 208 by the total number of segments in the 
utterance to generate the elements of the utterance parameter vector. For example, steps 
202, 204, 206, and 208 may result in the following elements of the utterance parameter 
vector: PI (0.48), P2 (0.13), P3 (0), P4 (0.10), P5 (0), P6 (0), P7 (0.07), P8 (0.03), PA 
(0.14), PB (0.03), and PC (0.02). The utterance parameter vector has a dimension 
corresponding to the number of classifications and represents the user's state, e.g., happy, 
sad, excited, subdued, and the like. In this example, the utterance parameter vector 
H = 0.48-P1 + 0.13 -P2 + 0-P3 + 0.10 -P4 + 0-P5 + 0-P6 

becomes: 

+ 0.07 • P7 + 0.03 • P8 + 0.14 • P<4 + 0.03 • PB + 0.02 • PC 

In another embodiment, the utterance parameters may also be analyzed on an emotion 
axis such as Stress/Relax, Happy/Sad, Excited/Calm, Tired/ Aroused, and Sleepy/ Awake. 
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[0025] In the other embodiment of classifying emotionally sensitive words in the 
utterance based upon speech recognition, steps 202, 204, 206, and 208 may result in the 
following elements of the utterance parameter vector: NGW (0.9), NFW (0.03), and NP 
(0.07), where NGW is the number of general words divided by N, NFW is the number of 
frustration words divided N, and NP is the number of pause segments divided by N, and 
N is the sum of the total number of words and the total number of pause segments in the 
utterance. The utterance parameter vector becomes: 

V2 = 0.9 • NGW + 0.03 • NFW + 0.07 • NP. 

[0026] Although particular ways of generating the utterance parameter vector is 
described herein, the utterance parameter vector may be generated in other manners as 
long as the elements of the utterance parameter vector indicate the degree of significance 
of each utterance parameter. 

[0027] Referring back to FIG. 1, the utterance parameter vector VI is converted 1 10 
to an indication representing the user's state, using a function. The function to be used to 
generate the indication indicative of the user's state is also empirically derived. For 
example, the linear function SI = ((1 - P2) + P6 + (1 - P7) + PA + PB + PC) / 6) is used 
in one embodiment of the present invention. Note that some of the detected utterance 
parameters are not used in the linear function SI and that some of the detected utterance 
parameters may be grouped together, because not all but only certain ones of the user 
states may be of interest to the method of the present invention. For example, utterance 
parameters corresponding to the same emotion axis (Stress/Relax, Happy/Sad, 
Excited/Calm, Tired/ Aroused, or Sleepy/ Awake) may be grouped together. In the 
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example described herein, the indication SI = ((1-0.13) + 0 + (1-0.07) + 0.14 + 0.03 + 
0.02) / 6 = 0.3383 (approximately). In the case of the linear function SI, it is determined 
that the user is happier if the scalar indication SI is near 1 and that the user is sad if the 
scalar indication SI is near 0. 

[0028] In step 111, information on the driving condition may be received in case the 
method is used with a navigation system on an automobile. Driving condition 
information may include, for example, how long the user has been driving, time of the 
day of driving, how windy or straight the road is, location of the automobile, and the like. 
It takes into account that long hours of driving tend to make the driver more tired, night 
time driving tend to make user more sleepy, long stretches of straight road driving makes 
user more bored. Driving condition information may also include environmental factors 
such as weather conditions (rain, ice), road quality, tire quality, traffic speed along the 
road, heat or air conditioner operation, condition of the windows (opened or closed), type 
of content being played in an in-car entertainment system such as a car radio, and also 
driving performance measures such as lane weaving, turning, braking, and the like. Step 
1 1 1 is optional but may be helpful in more accurately determining the state of the user. 

[0029| Then, the user's state is determined 1 12 based upon the indication. In one 
embodiment of the invention, it is determined that the user is happy if the indication SI is 
above 0.35. If the indication SI is not above 0,35, then it is determined that the user is 
sad. The threshold value (0.35 herein, for happy vs. sad) used to determine the user's 
state is also derived empirically. 
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[0030J Although a particular type of linear function S 1 and a particular threshold 
value (0.35) are described herein as an example, it should be noted that any type of 
function may be used as long as the function and the threshold value appropriately maps 
the utterance parameter vector to various user's states. For example, another linear 
function that may be used for the utterance parameter vector VI is: S2 = 1 - P7. In the 
case of the linear function S2, the user is happier if the indication S2 is near 1 and the 
user is sad if the indication S2 is near 0. In the case of the linear function S2, the method 
will determine that the user is in a happy state if the indication S2 is larger than 0.90 and 
the method will determine that the user is in a sad state if the indication S2 is not larger 
than 0.90. As another example, another linear function that may be used for the 
utterance parameter vector V2 is: S3 = 1 - NP. In the case of the linear function S3, the 
user is not at ease if the indication S3 is near 1 and the user is at ease if the indication S3 
is near 0. In the case of the linear function S3, the method will determine that the user is 
in an alert state if the indication S3 is larger than 0.95 and the method will determine that 
the user is in a sleepy state if the indication S3 is not larger than 0.95. The gender of the 
speaker of the utterance may also be determined by the analyzing the fundamental 
frequency of the utterance, because the fundamental frequency of a female voice is 
generally twice higher than that of a male voice. 

[0031] The driving condition information received in step 111 may also be 
considered in determining the user's state. For example, certain driving condition 
information may be used to weight the utterance parameter vectors higher or lower. The 
manner in which the driving condition information should be used to weight the utterance 
parameter vectors is determined empirically. 
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[0032] Although the indication has been described as a scalar on a linear scale herein, 
the indication may also be in the form of a vector of fuzzy classes generated derived from 
fuzzy processing of the utterance parameter vector. For example, the indication may be 
of a vector that shows: [probability of being at ease (0.8), probability of frustration (0.1), 
. . .]. The indication may also be in the form of an index derived from fuzzy processing of 
the utterance parameter vector. For example, the indication may be an index of an 
integer of 0 through 5, where 0 represents a calm male voice. 1 represents a calm female 
voice, 2 represents an aroused male voice, 3 represents an aroused female voice, 4 
represents a neutral male voice, and 5 represents a neutral female voice. 

[0033] Thereafter, the voice prompt of the interactive system is adjusted 114 based 
upon the determined user's state. Studies show that happy users prefer happy or aroused 
tone of voice or ways of speaking (content) and that sad users prefer sad or subdued tone 
of voice and ways of speaking (content) for the voice prompt. Studies also show that 
tones and content consistent with the user's state promote better driving, when the 
interactive system is in an automobile. Studies also show that male drivers prefer male 
voice prompts and female drivers prefer female voice prompts in certain cultures and vice 
versa in other cultures. Also, a very aroused voice prompt may cause loss of attention 
resulting in unsafe driving. 

[0034] Depending upon the culture and the driver's preference, the method of the 
present invention adjusts the tone of voice, content (way of speaking), speed (prosody), 
and/or gender of the voice prompt of the interactive system so that they are consistent 
with the determined user's state. The method of the present invention may adjust one of 



13 



23230/08142/DOCS/1372070.2 



the tone of voice, content, prosody, and gender of the voice prompt or any combination of 
the tone, content, prosody, and gender of the voice prompt based upon the determined 
user's state. The method of the present invention may also pause the voice prompt based 
upon the determined user's state. 

[0035] For example, if the user is in a happy state, the interactive system may use a 
rather happy tone or aroused tone and if the user is a sad state, the interactive system may 
use a iow tone or subdued tone. Also, the interactive system may change the content of 
the voice prompt to "Travel time is 30 minutes" if the user is in a happy state but change 
the content to "Don't worry, travel time is 30 minutes" to a user in a sad state. As 
another example, the interactive system may adjust both the tone and content of the voice 
prompt, to use "Travel time is 30 minutes" in a happy voice to a happy user but use 
"Don't worry, travel time is 30 minutes" in a sad voice to a sad user. As still another 
example, the method of the present invention may use a male voice prompt to a male user 
and a female voice prompt to a female user. It should be noted that any type of 
adjustment to the voice prompt may be done based upon the user's determined state in 
order to achieve effective interaction between the user and the interactive system and 
serve the user's preferences on the voice prompt of the interactive system. 

[0036] The method of the present invention has the advantage that the voice prompt 
of the interactive system may be adjusted to be consistent with the user's emotional state, 
thereby appealing to the user's preferences. In case of an automobile on-board computer 
interactive system, adjusting the voice prompt to be consistent with the driver's state 
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makes the driver feel comfortable and can also enhance better driving by promoting 
alertness, confidence, and tolerance. 

[0037] FIG. 3 is a block diagram illustrating an interactive system (virtual passenger) 
for adjusting its voice prompt based upon a user's state, according to one embodiment of 
the present invention. The interactive system 300 may be, for example, an on-board 
computer in an automobile that is used as a 'Virtual passenger" for controlling the various 
functionalities of the automobile. Referring to FIG. 3, the interactive system 300 
includes a microphone 302, a signal processing module 304, a controller 306, a speaker 
308, an O/S module 312, an utterance parameter vector generation module 314, a user 
state determination module 316, and a speech waveform storage module 3 1 8, which are 
interconnected via a bus 330. The system 300 may also be off-board the automobile, for 
example, within a call center wirelessly connected to the automobile via a cellular 
telephone or other wireless communication channel. The system 300 may also 
partitioned such that parts (e.g., microphone, speaker, etc.) of the system 300 is on-board 
the automobile and other parts of the system 300 is off-board the automobile within a call 
center wirelessly connected to the automobile via a cellular telephone or other wireless 
communication channel. 

[0038] The microphone 302 receives an utterance from a user in the form of an 
acoustic signal 301 and converts it to an electrical signal 303 that is passed on to the 
signal processing module 304. The signal processing module 304 partitions the utterance 
303 into segments and assigns a classification to each segment, for example, as illustrated 
with respect to step 106 of FIG. 1 and steps 202 and 204 of FIG. 2, to obtain utterance 
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parameters from the utterance. The signal processing module 304 may also include 
speech recognition capabilities incorporated therein. The signal processing module 304 
may be stand-alone signal processing circuitry or a memory device storing signal 
processing software run by the controller 306. The signal processing module 304 
provides the obtained utterance parameters 305 to the utterance parameter vector 
generation module 314. 

[0039] Tne utterance parameter vector generation module 3 14 generates an utterance 
parameter vector using the obtained utterance parameters 305 by counting the number of 
segments for each classification and dividing the number of segments for each 
classification by the total number of segments in the utterance, for example, as illustrated 
with respect to step 108 of FIG. 1 and steps 206 and 208 of FIG. 2. The utterance 
parameter vector generation module 314 provides the utterance parameter vector 315 to 
the user state determination module 316. The utterance parameter vector generation 
module 314 can be dedicated circuitry for generating the utterance parameter vector or a 
memory device storing software for generating the utterance parameter vector and run by 
the controller 306. 

[0040] The user state determination module 3 1 6 receives the utterance parameter 
vector 3 1 5 from the utterance parameter vector generation module 314. The user state 
determination module 316 converts the utterance parameter vector to an indication 
representing the user's state using a linear function, for example, as described with 
respect to step 1 1 0 of FIG. 1 . The user state determination module 3 1 6 also determines 
the user's state based upon the indication, for example, as described with respect to step 
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1 1 2 of FIG. 1 . For example, the user state determination module 3 1 6 may determine that 
the user is happy if the indication is above a predetermined threshold value and determine 
that the user is sad if the indication is not above the predetermined threshold value. The 
user state determination module 316 may also receive driving condition information as 
described in step 111 of FIG. 1 and use such driving condition information in 
determining the state of the user. The driving condition information may be detected and 
generated by a navigation system (not shown) or various sensors (not shown). 

[0041] Once the user's state is determined, the user's state information 3 17 is passed 
on to the speech waveform storage module 318. The speech waveform storage module 
3 1 8 is a memory device storing a plurality of sets of speech content in various tones and 
gender. The speech waveform storage module 318 also receives a selected information 
content from an external source (not shown) or the controller 306. The speech waveform 
storage module 3 1 8 also stores software for selecting the appropriate waveform based on 
the determined user state and the received information content, under the control of the 
controller 306. The information content or the manner in which the information content 
is selected or input to the speech waveform storage module is not the subject of the 
present invention, and it is assumed herein that the speech waveform storage module 318 
receives the selected information content, i.e., the message of the voice prompt that needs 
to be conveyed to the user. The information content may be, for example, an indication 
to the user that he will have to travel 30 minutes to reach the destination (information 
content). 
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[0042] In response, the speech waveform storage module 3 1 8 selects a voice prompt 
audio waveform 321 that corresponds to the received information content and is 
consistent with the determined user state. For example, if the user is in a happy state, the 
speech waveform storage module 318 may select and output a voice prompt 321 "Even 
without hurrying, you will arrive in 30 minutes" in a happy voice. On the other hand, if 
the user is in a sad state, the speech waveform storage module 3 1 8 may select and output 
the voice prompt 321 "I think you can arrive in 30 minutes" in a sad voice. Although the 
information content in these voice prompts are the same (an indication to the user that he 
will have to travel 30 minutes to reach the destination), the content (different sentences) 
of the voice prompt and the tone of voice (happy or sad tone of voice) of the voice 
prompt and/or the speed (prosody) of the voice prompt are selected differently by the 
speech waveform storage module 318 based upon the determined user state information 
317. The voice prompt audio waveform 321 selected by the speech waveform storage 
module 318 is passed on to the speaker 308. The speaker 308 outputs the selected voice 
prompt 309 to the user. 

[0043] The controller 306 controls the operation of the various components in the 
interactive system 300, including the microphone 302, signal processing module 304, 
speaker 308, O/S module 312, utterance parameter vector generation module 314, user 
state determination module 316, and speech waveform storage module 3 1 8 via the bus 
330. To this end, the controller 306 executes instructions or programs stored in the O/S 
module 312 as well as the signal processing module 304, utterance parameter vector 
generation module 314, user state determination module 316, and speech waveform 
storage module 318 to provide the various functionalities of the interactive system 300, 
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such as determining the user state and adjusting the voice prompt of the interactive 
system based upon the determined user state. The O/S module 312 stores the operating 
system for the interactive system 300. 

[0044] The system 300 may also include a display device (not shown) that is coupled 
to the controller 306 via the bus 330 and displays a graphical character corresponding to 
the voice prompt. In such case, the nature of the graphical character may also be adjusted 
based upon the user's determined state. For example, a happy character may be used in 
case of a happy user state and a sad character may be used in case of sad user state. 

[0045] The interactive system of the present invention has the advantage that the 
voice prompt of the interactive system may be adjusted to be consistent with the user's 
emotional state, thereby appealing to the user's preferences. In case of an automobile on- 
board computer interactive system, adjusting the voice prompt to be consistent with the 
driver's state makes the driver feel comfortable and can also enhance better driving by 
promoting alertness, confidence, and tolerance. 

[0046] FIG. 4 is a block diagram illustrating an interactive system for adjusting its 
voice prompt based upon a user's state, according to another embodiment of the present 
invention. The interactive system 400 may be, for example, an on-board computer in an 
automobile that is used as a "virtual passenger" for controlling the various functionalities 
of the automobile. The system 400 may also be off-board the automobile, for example, 
within a call center wirelessly connected to the automobile via a cellular telephone or 
other wireless communication channel. The system 400 may also partitioned such that 
parts (e.g., microphone, speaker, etc.) of the system 400 is on-board the automobile and 
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other parts of the system 400 is off-board the automobile within a call center wirelessly 
connected to the automobile via a cellular telephone or other wireless communication 
channel. 

[0047] Referring to FIG. 4, the interactive system 400 includes a microphone 402, a 
signal processing module 404, a controller 406, a speaker 408, an O/S module 412, an 
utterance parameter vector generation module 414, a user state determination module 
4i6, a speech synthesizer module 418, and a speech storage module 420. The interactive 
system 400 of FIG. 4 is identical to the interactive system 300 of FIG. 3, except that the 
interactive system 400 includes the speech synthesizer module 418 and the speech 
storage module 420, rather than the speech waveform storage module 318, to generate the 
adjusted voice prompt. 

[0048] Referring to FIG. 4, the speech synthesizer module 41 8 receives the 
determined user state information 417 from the user state determination module 416 and 
the selected information content from an external source (not shown) or the controller 
406. The information content or the manner in which the information content is selected 
or input to the speech synthesizer module 418 is not the subject of the present invention 
and it is assumed herein that the speech synthesizer module 418 receives the selected 
information content, i.e., the message of the voice prompt that needs to be conveyed to 
the user. The selected information content may be, for example, an indication to the user 
that he will have to travel 30 minutes to reach the destination (information content). 

[0049] Once the information content and the determined user state are received by the 
speech synthesizer module 418, the speech synthesizer module 418 retrieves the speech 
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419 corresponding to the information content from the speech storage module 420. The 
speech synthesizer module 41 8 may retrieve different speech (different sentences) 419 
depending upon the determined user's state. The speech synthesizer module 418 also 
stores software for synthesizing the appropriate audio waveform 421 for the voice prompt 
based on the determined user state and the retrieved speech corresponding to the 
information content, under the control of the controller 406. The speech synthesizer 
module 418 synthesizes the audio waveform 421 corresponding to the retrieved speech in 
an appropriate tone of voice, by changing the tone generation model, or gender of the 
voice prompt that is consistent with the determined user state. The synthesized audio 
waveform 421 of the voice prompt is output to the user through the speaker 408. 

[0050] For example, if the user is in a happy state, the speech waveform synthesizer 
module 418 may retrieve the speech 419 "Even without hurrying, you will arrive in 30 
minutes" from the speech storage module 420. On the other hand, if the user is in a sad 
state, the speech synthesizer module 418 may retrieve the speech 419 "I think you can 
arrive in 30 minutes" from the speech storage module 420. Although the information 
content of the retrieved speech is the same (an indication to the user that he will have to 
travel 30 minutes to reach the destination), the content (different sentences) of the 
retrieved speech is selected differently by the speech storage module 420 based upon the 
determined user state information 417. Then, if the user is in a happy state, the speech 
synthesizer module 418 may synthesize and output an audio waveform 421 for the voice 
prompt 421 "Even without hurrying, you will arrive in 30 minutes" in a happy voice. On 
the other hand, if the user is in a sad state, the speech synthesizer module 418 may 
synthesize and output an audio waveform for the voice prompt 421 "I think you can 
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arrive in 30 minutes" in a sad voice. Thus, both the content (sentences) and the tone of 
voice of the voice prompt are synthesized differently depending upon the determined user 
state, although the messages of the speech content in the voice prompts are identical or 
similar to each other. 

[0051] Although the present invention has been described above with respect to 
several embodiments, various modifications can be made within the scope of the present 
invention. For example, utterance parameters other than those described herein may be 
used to determine the user state. Although the present invention generates utterance 
parameter vectors to determine the user state, other methods, such as a look-up tables and 
the like, may be used. Various functions, other than those described herein may be used 
to covert the utterance parameters to various types of indications for determination of the 
user state, to the extent that the indications appropriately map the utterance parameters to 
the user states. Although the present invention describes adjusting the tone, content, 
speed or prosody, and/or the gender of the voice prompt, other attributes (e.g., volume, 
age, etc.) of the voice prompt may be adjusted as well. Also, when different user 
interfaces (e.g., characters of video interface, smell, tactile communication), other than 
voice, are used in the interactive system, such user interfaces may also be adjusted based 
upon the determined user state. Accordingly, the disclosure of the present invention is 
intended to be illustrative, but not limiting, of the scope of the invention, which is set 
forth in the following claims. 
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